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Arrangement and method for reproducing audio data as well as computer program product 
for this 



The invention relates to an arrangement for replaying audio data, which audio 
data corresponds to text data firom a text composed of words, with memory means for storing 
the audio data, into which memory means audio data to be stored can be read in a forward 
sequence, and with control means for controlling the replaying of stored audio data in a 
forward mode and in a reverse mode» and with audio replaying means. 

The invention further relates to a method for replaying audio data stored in 
memory means, which audio data corresponds to text data from a text composed of words, 
and into which memory means audio data to be stored is read in a forward sequence, during 
which method the replaying of audio data can be controlled in a forward mode and in a 
reverse mode. 

The invention further relates to a computer program product and to a computer 
designed for executing a computer program product of this kind. 



In the manual or automatic transcription of texts, especially when correcting 
texts transcribed automatically using voice recognition systems, it is usual to Usten to the 
dictated text, stored digitally in the form of audio data, by means of audio replaying means, 
e.g. headphones, wherein it may be, in the case of texts that have already been transcribed 
and have to be corrected, that the text corresponding to the stored audio data and aheady 
stored as a text file is displayed simultaneously by means of text display means, e.g. a 
monitor of a computer workstation. Li particular, it is also known hereby for the audio data 
and the text data relating to each other to be provided with corresponding word-marking data, 
vsdiich indicates the start of a word, for example, and which displays audio data and text data 
that correspond Avith each other, i.e. match, as linkage data, in order that they can be 
synchronously replayed, acoustically and visually, in forward mode. An appropriate 
technology for this purpose is described in e.g. patent document WO 01/46853 Al. It is 
hereby also known for the particular word that is currently being acoustically replayed to be 
visually highlighted in the text section being visually displayed, which may also be realized 
using the control data formed by the word-marking data, or linkage data. 
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The listening to and displaying of words in relation to each other is hereby 
enabled only in the forward mode and the forward sequence. If, starting from a momentary 
replay position, a return to a previous text location, counter to the forward sequence, takes 
place, an audio replay may simultaneously also be enabled, but this will likewise be counter 
5 to the forward sequence and will thereby be in an incomprehensible form. If, for example, a 
dictation is transcribed automatically or manually, and the person undertaking the 
transcribing and, if applicable, correction differs from the person who has dictated the 
dictation, the return to previous text locations will be foimd particularly irritating by this 
person imdertaking the transcription, since he is completely unaware of the spoken text and 

10 since, depending on the available software, the audio data — which is stored in memory 
means in a digital form - is presented to him in rapid succession, counter to the forward 
sequence, in an incomprehensible form. It will then be necessary for this person to switch 
maniially to replay in the forward sequence and to listen to the dictation passages in question 
and, in the case of a previous automatic transcription, to check the associated text words 

15 visually, wherein an audio replaying synchronous with the visually displayed text is possible 
only in this forward mode. This implies a comparatively high time input, as a result of which 
the concentration of the person processing the text may be impaired. His processing 
efficiency will also be detrimentally affected. 

Patent document US 2002/0062214 Al describes a text-marking system in 

20 which word groups are displayed on a computer monitor, wherein switching fields are 
provided for the control of different work steps. Two switching fields that are activated 
separately are hereby provided, in order that a jump may take place from a marked word, 
visually highlighted in a line of text, to the word immediately preceding it or the word 
immediately following it, in order to visually highlight fliis word and simultaneously to 

25 replay it acoustically. However, this control system is extremely laborious and time- 
consuming if, starting from a particular word, a text location a relatively long way before it, 
e.g. 10 or 20 words before it, is sought, wherein it is necessary to click manually on the 
appropriate switching field again and again. 



30 

It is an object of the invention to remedy this situation and to realize an 
arrangement and a method to enable the rapid, targeted seeking of spoken text passages in 
stored audio data, wherein the fewest possible manual control interventions on the part of the 
person undertaking the processing are also to be necessary. 
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In accordance with a first aspect, to achieve the object cited above, the 
invention provides an arrangement for replaying stored audio data, which audio data 
corresponds to text data from a text composed of words, with memory means for storing the 
audio data, into which memory means audio data to be stored can be read in a forward 
3 sequence, and with control means for controlling the replaying of stored audio data in a 
forward mode and in a reverse mode, and with audio replay means, wherein the control 
means are set up in such a way that, during a playback of audio data in reverse mode, starting 
from the particular momentary replay position in the audio data, they automatically initiate a 
backward jump, counter to the forward sequence, over a return distance corresponding to the 

10 length of at least roughly two words, to a target position, and then, starting from the 

particular target position, initiate a replay of audio data in the forward sequence for just one 
part of the retum distance. 

In accordance with a second aspect, the invention provides a method for 
replaying audio data stored in memory means, which audio data corresponds to text data 

15 from a text composed of words, and into which memory means audio data to be stored is read 
in a forward sequence, under which method the replaying of audio data in a forward mode 
and in a reverse mode can be controlled, wherein, during a playback of audio data in reverse 
mode, starting from the particular momentary replay position in the audio data, a backward 
jump takes place automatically, counter to the forward sequence, over a retum distance 

20 corresponding to the length of at least roughly two words, to a target position, and then, 

starting from the particular target position, a replay in the forward sequence is undertaken for 
just one part of the retum distance. 

Using the method in accordance with the invention, a search for particular text 
passages in the audio data can be undertaken more rapidly and efficiently than in the case of 

25 the prior art. If, for example, during a transcription or correction of a text undertaken by a 
person, the problem arises, when a text location is reached, that this person possibly 
recognizes a lack of clarity or a discrepancy or an error in a previously transcribed or 
corrected text location that occurred 10 or 20 words previously in the text being transcribed 
or corrected, a corresponding search in reverse mode can be undertaken extremely rapidly 

30 and fijlly automatically - following the starting of the method in accordance with the 

invention - whereby, with computer assistance, a jump takes place automatically, according 
to the specified retum distances, to target positions in the text located further back, and 
subsequently an acoustic replay takes place for only a specified part of the particular retum 
distance in the forward sequence. As a result, a comprehensible audio playback is realized, so 
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the person in question has no problems of comprehension. The jump backwards in the text 
represented by the audio data takes place, if a corresponding transcribed text is already 
available, preferably using word-marking data as control characters, which normally indicate 
the start of a new word. If no transcribed text is yet available, the return distance for the 

S automatic backward jump in the audio data is estimated, e.g. to correspond to the mean data 
length or time of at least two words, whereini, for example, a forward replay time of one or 
two seconds constitutes the basis. In this context, therefore, it can also be stated that the 
particular retum distance does not have to correspond precisely to the length of multiple 
actually spoken words, since the audio data may also be divided into segments according to 

10 averaged ^Vord lengths". The same applies to the duration of the replay in the forward 
sequence provided after each backward jump. 

Accordingly, under the method in accordance with the invention, a jimip 
backwards by (roughly) two or three words may occur, followed by an automatic audio 
replay of (roughly) one word, wherein the word currentiy being replayed will afterwards be 

1 5 one of the two or three words over which the backward jump in the audio data takes place in 
the folloAving procedural step. It is, however, also conceivable to jmnp backwards by a retum 
distance corresponding to a larger number of words and, when replaying in the forward 
sequence, only to activate or replay one word at a time in the spoken text, so, for example, 
only every fourth or fifth word is replayed. However, multiple words may also be replayed in 

20 the forward sequence. 

The audio replay may occur simply word-wise, i.e. pending the appearance of 
the next word-marking data if this is already available, wherein this next word-marking data 
identifies the start of the next word. However, a segment-wise replay may also be undertaken 
with a fixed replay time, e.g. in the range of 0.5 to l.S seconds, e.g. corresponding to an 

25 averaged word dxiration, which is stipulated by a timing circuit. This will be the case 

primarily if no transcribed text with corresponding word-marking data is yet available. It is 
also conceivable for one word to be replayed as a whole and the next marked word only 
partiy at the start of the word. It is further conceivable for the person imdertaking the 
processing to choose between the dififerent options cited above. 

30 During the reverse search run described, it is further preferred in accordance 

with the invention for the particular audio replay in the forward sequence to take place at an 
adjiistable speed in order that the person undertaking the processing can cause the search run 
as a whole to take place more quickly or more slowly as required. The backward jump to the 
former target position in the text being replayed, as may be stipulated e.g. by the word- 



wo 2004/036541 PCT/IB2003/004497 

5 

marking data mentioned, can be imdertaken very quickly, i.e. virtually without time loss, 
wherein, in this ^^ast rewind'' mode, no acoustic audio replay need be imdertaken. 

The method in accordance with the invention may be used to very particular 
advantage in conjunction with a transcription system, with which dictations arriving in a 

S manner that is conventional per se, e.g. via a commxmication network such as LAN, WAN or 
Intemet, or via sound carriers, are converted automatically by voice recognition means into a 
text file which is then checked and, if applicable, corrected using word processing software 
while listening to the dictation audio data. A linkage of the words in the audio file on the one 
hand and the associated words in the text file on the other is hereby made on the basis of the 

10 assigned word-marking data, which is therefore also designated linkage data. During the 

replay, the word currently being replayed acoustically is also highlighted visiially on the text 
display means, e.g. by underlaying with a light background. The invention here provides a 
"synchronous reverse replay mode", wherein the words firom the text file are visually 
highlighted in sequence - counter to the forward sequence — and, synchronously with each 

1 S word visually highlighted, the word in the audio data corresponding to this word is replayed 
acoustically in the recording sequence, i.e. comprehensibly. This achieves the advantage that 
the checking of associated, visually displayed words with the aid of a comprehensible audio 
replay of the corresponding audio data can be undertaken without problems. A localization of 
a position in the text is also significantly simplified, and the overall efficiency of the 

20 correction of transcribed dictations is increased. 

The invention can thereby be used to advantage in a typical transcription 
system of this kind, in which dictations are received by receiver stations and automatically 
transcribed by transcription stations, after which a manual correction of the transcribed 
dictations is undertaken by a correction station and finally, a delivery of text files 

25 corresponding to the received dictations is undertaken by the delivery station. However, the 
invention may, of course, also be used in a transcription system realized by a single 
computer, especially a personal computer, with which the steps mentioned, namely the 
reception, the automatic transcription, the correction and finally the delivery of the text data, 
may be xmdertaken. 

30 As already mentioned, the invention may, moreover, also be used for a manual 

transcription of a dictated text if, using the dictation, i.e. the audio data being listened to, the 
text file is produced manually vsdth a word processing system, preferably with assignment of 
linkage data for the audio data and the text data corresponding to the above-mentioned word- 
marking data, wherein, following the production or transcription procedure, corrections are 
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also made if applicable. In particular, therefore, the invention may also be realized in a 
mobile dictation apparatus or an audio-replay apparatus of digital design. 

In accordance with a third aspect, the invention also provides a computer 
program product that can be loaded into a memory of a computer and which comprises 
S sections of software code in order that, by means of their implementation following loading 
into the computer memory, the method in accordance with the invention can be implemented 
with the computer. 

Finally, in accordance with a fourth aspect, the invention provides a computer 
with a processing unit and an internal memory, which computer is designed to implement the 
1 0 computer program product in accordance with the invention. 



The invention will be further described with reference to examples of 
embodiments shown in the drawings, to which, however, the invention is not restricted. 
IS Fig. 1 shows, schematically, a routine for the synchronous replaying of audio 

data and text data in a forward mode. 

Fig. 2 shows, schematically, a routine for the replaying of audio data and text 
data with mutual assigning in a reverse mode according to the prior art. 

Fig. 3 shows, schematically, a routine for the replaying of audio data and text 
20 data in a reverse mode in accordance with the invention. 

Fig. 4A shows a routine similar to the routine shown in Fig, 3 for the audible 
replaying of audio data in a reverse mode, wherein a text that has been previously 
automatically transcribed and requires correction is illustrated. 

Fig. 4B shows the text corrected using the routine shown in Fig. 4A during a 
25 replay in reverse mode, as a sequence of words displayed on e.g. a monitor. 

Fig. 5 shows, schematically in the form of a block diagram, a transcription 
system with an arrangement for audio replay, with which "synchronous reverse mode replay" 
in accordance with the schematic illustrations shown in Fig. 3 and Fig. 4A can be undertaken. 

Fig. 6 shows, in the form of a block diagram and in a more detailed maimer, 
30 the system components of the transcription system shown in Fig. S that are provided for a 
"synchronous reverse mode replay". 

Fig. 7 shows a modified routine for a replay in reverse mode that is similar to 
the routine shown in Fig. 4A, but somewhat simplified. 
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Fig. 8 shows, in the form of a flowchart, a variant of a method for synchronous 
replay in reverse mode. 

S Fig. 1 illustrates schematically a routine for the replaying of audio data Al ..A4 

(generally Ai) synchronously with text data T1..T4 (generally Ti) in a forward mode, wherein 
the replaying (reading out) of the data takes place in the same sequence, or the same 
direction, as the recording (reading-in) of the data (firom left to right in Fig. 1). This sequence 
is always designated the forward sequence. Audio data Ai and text data Ti, in associated 

1 0 pairs, hereby represent a succession of words Al/Tl ..A4/T4 from a text. A word-marking 

code or word-marking data Ml.. MS (generally Mi), which simultaneously fomis linkage data 
for a synchronous replaying of the audio data Ai and text data Ti, is assigned to the start of 
each word. During replaying, text data TI, T2..T4 (i.e. successive words) are activated 
successively in accordance with the arrows 1, 2, 3, 4 shown at the bottom of Fig. 1, and 

IS highlighted visually on a display means (not shown m Fig. 1), and, synchronously with this, 
the particular word is replayed acoustically, from the corresponding digital audio data Al, 
A2..A4, in accordance with the steps indicated at the top with arrows 1 , 2, 3 and 4. This 
simultaneous visual and acoustic replaying of words from the text using the marking or 
linkage data Mi in a forward mode represents the prior art which is known per se. 

20 Fig. 2 also illustrates schematically a known routine for replaying in a reverse 

mode. Here, the words T4, T3, T2, TI are activated backwards in succession, from right to 
left as shown in Fig. 2, as indicated by the bottom arrows 1, 2, 3 and 4, and highlighted 
visually on the display means, which are not shown. Simultaneously, using the word-marking 
or linkage data MS, M4, M3 and M2, the corresponding audio data A4, A3, A2 and Al, i.e. 

25 the words counter to the forward sequence, are replayed as indicated by the top arrows 1, 2, 3 
and 4 in Fig. 2. This acoustic replaying thereby takes place counter to the recording sequence, 
i.e. counter to the recording direction, and therefore results in an incomprehensible audio 
signal. This impedes the finding of particular text locations, which is possible only on the 
basis of the visual display, but this conflicts with the normal operating mode during the 

30 transcription or correction of texts following a dictation, since persons who are undertaking a 
transcription or the correction of a transcription direct their concentration towards an 
acoustically replayed audio signal when searching both backwards and forwards, wherein, 
even when processing directly, these persons will also write or correct the text according to 
the heard audio signal. 
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If no transcribed text (with words Tl .,T4) is yet available, the finding of 
words located further back through the sole use of audio data Ai is extremely cumbersome in 
this case. 

Differing firom these known techniques, the individual audio data Ai can now 
5 be replayed acoustically in reverse mode also, so, in the case of the above sequence A4, A3, 
A2 and AI, for example, this will be in the forward sequence, i.e. in the recording direction, 
as shown schematically in Fig. 3 with arrows 1, 2, 3, 4 above the audio data A4, A3, A2, AI. 
Simultaneously, if corresponding text data T4, T3, T2, Tl is already available, a visual 
display of the words represented by the text data Ti is initiated in accordance with arrows 1, 

10 2, 3 and 4 at the bottom of Fig. 3. 

Fig. 4 A illustrates in detail how a backward jump in the audio data AI to A6 
as well as the associated text data Tl to T6 takes place during a "synchronous replaying in 
reverse mode" of this kind, and a comprehensible audio replay is generated. The example 
used is that of a text passage in a dictation which should correctly read "TO BE OR NOT TO 

15 BE" (see also Fig. 4B), but which has been transcribed by an automatic transcription system 
in the form shown m Fig. 4A, namely "TWO BEE OR NOT TWO BEE". In Fig. 4A, this 
word sequence is shown in a bar 1 1, which appears as such on, for example, a (not shown) 
visxial display means, e.g. a monitor, to display the individual words - the text data Tl, 
T2..T6 - visually. These words are also stored as corresponding audio data AI, A2..A6 in 

20 digital form in tiie audio-data memory means, which is not shown in Fig. 4A, and can be read 
firom this for audio replay. The word-marking or linkage data Mi provided for this purpose is 
again indicated schematically in Fig. 4A at Ml, M2 .... M7. 

Specifically, as shown in Fig. 4A, a backward jump takes place firom a 
momentary replay position located further on in the spoken or transcribed text Ti (fiirther to 

25 the right in Fig. 4A) to a previous target position, e.g. to the start of the word T6/A6 ("BEE") 
as identified by linkage data M6. This backward jump is indicated by arrow 1 A in Fig. 4A, 
Subsequently, specifically this word A6 is replayed in the forward sequence from the stored 
audio data Ai, see arrow IB. At the end of the word A6/T6 (or at a next word marked by 
marking data M7), a backward jump (see arrow 2A) takes place automatically, in this case 

30 over a minimum return distance corresponding to the length of two words A5 + A6, or 
T5 + T6, to the start of word T5 (text data) or A5 (audio data) as indicated by linkage 
data M5, after which the word A5 is replayed as an audio signal in the forward sequence as 
indicated by arrow 2B. This procedure is automatically repeated successively with 
words A4/T4, A3/T3 etc., see arrows 3 A (backward jimip to target position M4), 3B 
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« (acoustic replay of word A4 in recording direction) etc. as far as arrows 6A, 6B, In Fig. 4A, 
therefore, the arrows 1 A, 2A, 3A. . .6A indicate the return distances, whereas the arrows IB, 
2B, 3B. . .6B indicate those sections of the return distances for which an audio replay in the 
forward sequence takes place. 

5 During the described section-wise backward jumping and listening to the 

individual words Ai in the forward sequence, the particular word may be directly corrected 
per sCy or a retum is made to the start of the particular text passage, after which the entire 
word sequence Tl to T6 may be corrected in the normal manner by listening in the forward 
sequence and visual display also in the forward sequence, so the correct te>ct, as shown in 

10 bar 1 r in Fig. 4B, is obtained. 

During the acoustic replaying of the individual audio data A1..A6, the text 
data corresponding to it, T1..T6, is especially visually highlighted on the monitor, e.g. 
through the displaying of a light background. 

It is also indicated schematically in Fig. 4 A, by means of a broken-line 

15 extension at arrow 2B, that the acoustic replaying may also extend past the word in question 
to include - in part - the next word in line, i.e. a '"word-overlapping" audio replay may be 
provided. This happens, for example, when the marking data Mi next in line, e.g. M6, is not 
used as the control code in order to terminate the particular audio replay, but a fixed replay 
time based on time counting is provided. The fixed replay time may be e.g. one second or 1 .5 

20 seconds, or even a little less than a second. A predetermined time duration of this kind for the 
audio replay should be provided in particular if no transcribed text is yet available and 
therefore no word-marking data is yet available as a control code either. 

In a similar manner, the retum distance for the backward jump may also be 
calculated to correspond to fixed time spans, e.g. corresponding to twice or three times a 

25 mean word length. 

Fig. 5 shows an example of an arrangement 12 for replaying audio data Ai 
synchronously with the replaying of text data Ti, which arrangement comprises a 
transcription data processing device 13. With this arrangement 12, in a maimer that is normal 
per se^ a dictation file is transmitted firom users 14. 1 . . . . 14.N via a conununication medium, 

30 e.g. a communication network IS such as LAN, WAN or Internet, to the arrangement 12, and 
received via a communication device, a modem 16 in the present case, and then sent to voice 
recognition means 17. It should be mentioned that the communication means may also be 
realized by a so-called "Private Branch Exchange", or PBX for short. 
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The voice recognition means 17, in which voice recognition software that is 
normal per se is implemented, undertakes an automatic transcription of the dictation file into 
a text file, wherein, while generating the word-marking data or linkage data Mi to correspond 
to the individual items of audio data Ai firom the associated audio file, tiie individual words 
5 of text data Ti are stored in text memory means 18 and audio memory means 19 respectively. 
As already stated above, the words in the text data Ti and in the audio data Ai (text memory 
means 18 and audio memory means 19 respectively) that correspond with each other are 
permanently assigned to each other, or linked, to each other by means of the word-marking 
data Mi. Via this linkage, audio data and text data Ai and Ti belonging together can be 

* 

10 invoked and replayed ui pairs by control means 20. The visual replaying of text data Ti 

initiated by control means 20 takes place via word processing means 21 on display means 22, 
such as, in particular, a computer monitor. 

The acoustic replaying of audio data Ai takes place by reading the digitally 
stored audio data Ai firom memory means 19 and sending it to a replay circuit 23 for an 

1 5 electro-acoustical transducer 24, wherein headphones are generally used for this purpose. 
Reading takes place hereby in the forward sequence. 

The control of the entire routine when jumping backwards fi-om the 
momentary replay position to previous target positions m the text and for synchronous 
forward replay takes place using sections of software code stored in an internal memory 25. 

20 As the interface with the user for activation of the particular control procedures and for the 
various inputtings in the course of transcription or text correction, a conventional keyboard or 
similar serves as the inputting means 26. A footswitch operating device may also be provided 
to control forward and backward replay. 

The replay circuit 23, which may, in a manner that is normal per se^ comprise 

25 a digital/analog converter, an amplifier and similar components, and the transducer 24 
together form audio replay means 27. 

Fig. 6 shows, in greater detail, how control means 20 controls the replaying, or 
reading, of audio data Ai from audio memory means 19, and sending it to replay means 27 in 
association with text data Ti, together with word-marking data Mi, stored in e.g. text memory 

30 means 18. A central control circuit 28 is hereby connected to audio memory means 19, either 
directiy or via defining means 29, which defines the particular retum distance when jumping 
backwards in audio data Ai. Also connected to control circuit 28 is a timing circuit 30 to 
enable the acoustic replaying of the audio data in the forward sequence in the reverse mode, 
as described above with reference to Fig. 4A, over a predetermined, fixed time diaration. 
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Timing circuit 30 may be, for example, a normal clock generator, wherein control circuit 28 
measures the particular time duration desired — which can be set via e.g. the inputting 
means 26 - through the counting of clock pulses. A replaying time duration (see arrow 2B in 
Fig. 4A) of e.g. one second or 1.5 seconds can be set in this way. As an alternative, if word- 
5 marking data Mi is already available, a "word-wise'' replaying may also be selected, wherein, 
when the marking data Mi next in line is reached, the replay procedure is terminated by 
control circuit 28. Timing signals, especially clock pulses, emitted by timing circuit 30 may 
also be used as the basis for determining the return distances in defining means 29. 

It is preferably further possible to set the return distance for section-wise 

10 backward jxmiping to target positions in the previous text, wherein the skipping of more than 
two words at a time, e.g. of three, four or five words, is conceivable, and wherein the 
corresponding nimiber of marking data or linkage data Mi must be counted. To this end, 
control circuit 28 is cotmected to counting means 3 1 . 

Like timing circuit 30, this counting means 3 1 may, of course, be realized in 

IS terms of software with the aid of control means 20, as may defining means 29, which may be 
realized by corresponding addressing in memory means 19. Control circuit 28 in accordance 
with Fig. 6 then coincides with control means 20 in accordance with Fig. S. In addition, 
setting means 32 is provided for setting the speed of the audio replay with replaying 
means 27. 

20 It should be mentioned that the word-marking data Mi may also be stored, as a 

whole, in audio memory means 19, or divided up, in both text memory means 18 and audio 
memory means 19. 

For the purpose of better illustration. Fig. 7 shows schematically the procedure 
during backward jumping by more than two words at a time, e.g. by three words at a time, 

25 see arrow 1 A in the reverse direction, wherein, following a backward jump procedure of this 
kind, an audio replaying of the following word in the forward direction, i.e. the forward 
sequence, takes place, see e.g. arrow IB in Fig. 7. In this manner, in the example shown in 
Fig. 7, only those words identified by the marking data M7, M5, M3, Ml (in this order) are ' 
acoustically replayed, see also the dot above this reference letter in Fig. 7. Those words to 

30 which marking data M8, M6, M4 and M2 are assigned are, however, skipped as regards 
acoustic replaying. 

More than three words at a time may, of course, also be skipped during 
backward jumping, so only every third, fourth etc. word will then be replayed during the 
subsequent acoustic replaying. 
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Fig. 8 shows a flowchart which illustrates an example of the procedure during 
the above-mentioned synchronous reverse replaying and forward replaying. The flowchart 
also comprises configuration stages and calculation stages that precede the actual backward 
jump procedure and forward replay procedure. 
5 According to Fig. 8, startmg at a block 33, a configxiration takes place of 

options 01 to 05 as regards the audio replaying and calculation of return distances lA, 2A, 
3 A.. 6 A for backward jumping in the audio data, as described above. These options Ol to 05 
may be provided as follows, for example: 

01 - This option Ol is selected if voice recognition means 17 is provided and 
10 used to create text data Ti automatically fi:om the audio data Ai, wherein the described word- 
marking data Mi is then also automatically defined as linkage data by voice recognition 
means 17. 

02 — This option 02 relates to the case where manually transcribed text data 
Ti is to be used to define the length of individual segments or 'Svords" of the audio data Ai. A 

IS fixed length for all audio segments is provided in accordance with option 02, which length is 
calculated firom the total time duration of audio data Ai and the mmiber of transcribed words 
by simple division, using control means 20. The individual audio segments or words may 
then be **numbered", i.e. provided with addresses or indices, in order to use the nimibers or 
addresses to determine the return distances 1 A, 2A, 3A,.6A and/or the sections IB, 2B, 

20 3B..6B to be replayed acoustically. 

03 — This option 03, which is very similar to option 02, is selected if — 
likewise on the basis of a manual transcription of text data Ti on the basis of the heard audio 
data Ai - the audio segments are calculated by control means 20 with variable lengths, on the 
basis of the syllables of the words and the total length of audio data Ai. All syllables are 

25 hereby assumed to be of the same length. 

04 - Option 04 may be selected if no text data Ti is yet available, wherein the 
lengths of audio segments or words of audio data Ai are calculated on the basis of audio- 
energy-profile mformation. 

05 — With this option OS, audio segments of fixed lengths are assumed, e.g. 
30 segment lengths of one second, wherein an overlapping of segments, e.g. with a time duration 

of 1/3 second, may also be provided. 

Speeds for normal replaying in the forward sequence, for backward jumping 
rapidly, forward replaying rapidly and for section-wise audio replaying in the forward 
sequence during backward jumping may also be selected at block 33. 
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At a block 34, the reverse mode replaying in question here is then started, e.g. 
by actuating a corresponding key on the inputting means 26. It is then automatically queried 
at a block 35 whether the user has specified an overall retum distance over which the 
backward jump in audio data Ai is to take place in total in reverse mode. If this is not the case 
S (see ou^ut N of block 35), an appropriate overall distance for the backward jump is 

automatically stipulated as a default value, e.g. a backward jvmip to the start of the audio data, 
see block 36 in Fig. 8. If, however, the user has made an appropriate entry (output Y of 
block 35), the selected overall retum distance is used as the basis at a block 37. 

The two branches in the flowchart of Fig. 8 thus arrived at then come together 
1 0 again and, at a node 38, it is illustrated symbolically with an arrow that the steps that now 
follow will be repeated in the event of a text amendment before the momentary audio 
position, without tiie reverse mode (see block 34) being terminated. 

At a block 39, it is then queried whether voice-recognition ou^ut data is 
available, i.e. whether text files Ti transcribed automatically using voice recognition software 
15 in voice recognition means 17 are available. If this is the case (see output Y of block 39 in 
Fig. 8) it is queried at a block 40 whether option Ol (see block 33) has been specified. If this 
is the case (output Y of block 40), the calciilation of the audio segments or words in the audio 
data Ai then takes place at a block 41 in a manner corresponding to the known and edited text 
words in text data Ti. 

20 If, however, voice recognition is not used (output N of block 39), it is queried 

at a block 42 whether or not text data Ti is present before the momentary audio position. If 
this is not the case (oulput N of block 42), i.e. if no text data Ti is yet available, an estimation 
of the length of aiidio segments is made at a block 43 on the basis of the overall length of 
audio data or the audio energy profile, see the above options 04 and OS. 

25 If, however, according to the check at block 42, text data Ti is already present 

before the momentary audio position (see output Y of block 42), it is then queried at a 
block 44 whether option 02 or 03 was configured at block 33, and, if an option 02 or 03 
was not configured (output N of block 44), the estimation described above in connection with 
block 43 likewise takes place. In the event of the configuration of an option 02 or 03 

30 (output Y of block 44), an estimation of the length of the audio segmjents or words of the 
audio data Ai then takes place at a block 45 on the basis of the overall length of the audio 
data and the number of words or syllables (options 02 and 03). 

In area 46, outlined with broken lines, of the flowchart shown in Fig. 8, the 
actual steps involved in the backward jump and the section-wise replaying in the forward 
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sequence are undertaken. At a block 47, the audio replaying of a word or segment of audio 
data Ai in the forward sequence, thereby in a comprehensible manner, is started, wherein the 
replay takes place at the speed set at block 33. At a block 48, it is then queried whether the 
end of the word or segment of audio data Ai to be replayed in the forward sequence has been 
5 reached. If this is not the case (ou^ut N), replaying is continued at a block 49 until 

eventually the query at block 48 reveals that the end of the word or segment has been reached 
(output Y of block 48). At a block SO, a backward jump in audio data Ai to the next specified 
target position then takes place, e.g. over a retum distance corresponding to the length of 
three words. Subsequently, it is queried at a block 51 whether the specified starting position 

10 (see blocks 36 and 37) has been reached. If this is not the case (output N), a retum to 

block 47 takes place. If, however, the starting position, i.e. the end of the reverse mode, has 
been reached (output Y of block 51), the reverse mode is terminated at a block 52. 

It is also preferable if provision is made for the procedvure described, which 
runs automatically, controlled by control means 20, to be terminated manually at any time 

15 before the specified end is reached by means of a '"STOP" input at inputting means 26. 

It should be mentioned that, following correction of the text, a redefinition of 
the word-marking data Mi or a recalculation of the length of the audio segments may be 
necessary in some circumstances. 

It should fiirther be mentioned that, following a backward jump constituting 

20 more than two words, e.g. four, five, six or more, an acoustically satisfactory replaying of 
more than one word, e.g. of two or three or four words, may be undertaken. 



